Online Clustering of Data Streams

نویسنده

J. Beringer

چکیده

We consider the problem of clustering data streams. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating but a constant time delay. For this purpose, we develop an efficient online version of the classical K-means method. Our algorithm’s efficiency is mainly due to a (discrete) Fourier transform of the original data, resulting both in a smoothing as well as a compression of these data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probability Density Grid-based Online Clustering for Uncertain Data Streams

Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes accordi...

متن کامل

Online clustering of parallel data streams

In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel stre...

متن کامل

On clustering large number of data streams

Data streams and their applications appear in several fields such as physics, finance, medicine, environmental science, etc. As sensor technology improves, sensor data rates continue to increase. Consequently, analyzing data streams becomes ever more challenging. Fast online response is a must for applications that involve multiple data streams, especially when the number of data streams is lar...

متن کامل

Online-Data-Mining auf Datenströmen: Methoden zur Clusteranalyse und Klassifikation

• J. Beringer and E. Hüllermeier. Efficient instance based learning on data streams. Adaptive optimization of the number of clusters in fuzzy clustering. Fuzzy clustering of parallel data streams. Adaptive optimization of the number of clusters in fuzzy clustering.

متن کامل

Benchmarking Stream Clustering Algorithms within the MOA Framework

In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We pre...

متن کامل